-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up cluster merge by batch copying arrays #280
Speed up cluster merge by batch copying arrays #280
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this impressive speedup. I guess the improvement comes from changing the loop over source
with element-wise calls to zarray_add
, to the single memcpy
?
I think the commit could use an explanation of what is going on and where the speed improvements come from.
common/zarray.h
Outdated
@@ -437,26 +437,36 @@ static inline int zarray_index_of(const zarray_t *za, const void *p) | |||
return -1; | |||
} | |||
|
|||
/** | |||
* Add elements from start up to and excluding end from 'source' into 'dest'. | |||
* el_size must be the same for both lists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "el_size" here be el_sz
instead?
common/zarray.h
Outdated
static inline void zarray_add_all(zarray_t * dest, const zarray_t * source) | ||
{ | ||
assert(dest->el_sz == source->el_sz); | ||
|
||
// Don't allocate on stack because el_sz could be larger than ~8 MB | ||
// stack size | ||
char *tmp = (char*)calloc(1, dest->el_sz); | ||
|
||
for (int i = 0; i < zarray_size(source); i++) { | ||
zarray_get(source, i, tmp); | ||
zarray_add(dest, tmp); | ||
} | ||
|
||
free(tmp); | ||
zarray_add_range(dest, source, 0, source->size); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to keep the function zarray_add_all
if it only maps to zarray_add_range
? We could also just search & replace all calls to zarray_add_all
instead.
merge_clusters was using zarray_add_all which was copying over elements one-by-one doing 2 memcpys and a potential array resize per element. Here we replace it by a range copy that does a single resize and memcpy for the operation which is a lot faster. In my testing it reduces the total runtime for an image that's 2000x3000 by 20%.
0ec3dbb
to
e1b143c
Compare
@christian-rauch addressed your comments! |
I was looking through the code and found that a lot of time was being spent on 'make clusters' so I did a quick look through why that is the case. I found the merge_clusters function to be inefficient, so I optimized it to do batch memcopies wherever possible.
This gives me a ~20% reduction in time spent for a large image.
Before
After